Skip to content

Conversation

phananh1010
Copy link
Owner

@phananh1010 phananh1010 commented Sep 30, 2025

Single commit with tree=051723c20ef007ff63b8b530a5b3805947404935^{tree}, parent=4c4e49b2e9cbae0572760a0fc38dbbd58f7fa2be. Exact snapshot of upstream PR head. No conflict resolution attempted.

Summary by CodeRabbit

  • New Features
    • Dynamic target index selection for OTEL metrics using attributes and receiver name, with support for explicit index overrides.
    • Metric documents now include data_stream metadata (type, dataset, namespace) when applicable.
  • Refactor
    • Centralized sanitization of data stream fields for consistent normalization.
    • Improved grouping and routing via a typed target index.
  • Tests
    • Added extensive tests for index routing, receiver-based routing, defaulting, sanitization, and ignoring routing-related attributes.

BASE=4c4e49b2e9cbae0572760a0fc38dbbd58f7fa2be
HEAD=051723c20ef007ff63b8b530a5b3805947404935
Branch=main
@coderabbitai
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Centralizes data stream field sanitization in DataStream and updates RerouteProcessor to use it. Introduces TargetIndex for deriving target indexes and integrates it into OTEL data point grouping and metric document building. Adjusts visibility in MappingHints. Adds comprehensive tests for TargetIndex, grouping behavior, document data_stream output, and ignored attributes.

Changes

Cohort / File(s) Summary
Centralized sanitization in DataStream
server/src/main/java/.../DataStream.java
Adds public static sanitizers: sanitizeType, sanitizeDataset, sanitizeNamespace. Implements shared normalization with patterns, length cap, and replacement.
RerouteProcessor uses DataStream sanitizers
modules/ingest-common/src/main/java/.../RerouteProcessor.java
Removes local sanitization logic and constants; delegates to DataStream::sanitizeType/dataset/namespace in DataStreamValueSource factories.
Target index derivation (new feature)
x-pack/plugin/otel-data/src/main/java/.../datapoint/TargetIndex.java
New class encapsulating target index resolution from attributes/scope/resources and optional receiver; supports data stream defaults and explicit index. Provides helpers and accessors.
Grouping uses TargetIndex
x-pack/plugin/otel-data/src/main/java/.../datapoint/DataPointGroupingContext.java
Changes map keys and DataPointGroup field/constructor from String to TargetIndex; extracts receiverName from scope; evaluates TargetIndex per group.
Metric document data_stream output
x-pack/plugin/otel-data/src/main/java/.../docbuilder/MetricDocumentBuilder.java
Emits data_stream {type,dataset,namespace} when TargetIndex indicates a data stream; ignores TargetIndex-related attributes during attribute rendering.
Mapping hints key visibility
x-pack/plugin/otel-data/src/main/java/.../docbuilder/MappingHints.java
Makes MAPPING_HINTS private; adds public isMappingHintsAttribute(String).
Tests: TargetIndex and routing
x-pack/plugin/otel-data/src/test/java/.../datapoint/TargetIndexTests.java
.../datapoint/DataPointGroupingContextTests.java
Adds tests for TargetIndex evaluation, receiver-based routing, grouping across indices. Validates sanitization and precedence rules.
Tests: Metric document data_stream
x-pack/plugin/otel-data/src/test/java/.../docbuilder/MetricDocumentBuilderTests.java
Uses TargetIndex.defaultMetrics(); asserts data_stream fields in output.
Tests: Ignored attributes in TSID funnel
x-pack/plugin/otel-data/src/test/java/.../tsid/AttributeListTsidFunnelTests.java
Adds test ensuring target-index-related keys are ignored by the funnel.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor OTELExporter as OTEL Exporter
  participant Ingest as Ingest Pipeline
  participant Group as DataPointGroupingContext
  participant Target as TargetIndex
  participant Builder as MetricDocumentBuilder
  participant ES as Elasticsearch

  OTELExporter->>Ingest: Send datapoints
  Ingest->>Group: Group datapoints by scope/resource
  Group->>Group: extract receiverName from scope name
  Group->>Target: evaluate(type, attributes, receiverName, scopeAttrs, resourceAttrs)
  Target-->>Group: TargetIndex (index or data_stream fields)
  Group->>Builder: DataPointGroup with TargetIndex
  Builder->>Builder: buildDataStream if isDataStream()
  Builder->>ES: Index document to TargetIndex.index()
  ES-->>OTELExporter: Ack
Loading
sequenceDiagram
  autonumber
  participant RP as RerouteProcessor
  participant DS as DataStream

  RP->>DS: sanitizeType(type)
  RP->>DS: sanitizeDataset(dataset)
  RP->>DS: sanitizeNamespace(namespace)
  DS-->>RP: sanitized values
  RP->>RP: proceed with routing using sanitized values
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paw—new streams aligned,
Indices hop where rules are defined.
A nibble of names, sanitized neat,
TargetIndex guides my fleet.
From scope to doc, we bound in time—
Carrots cached, mappings sublime. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.51% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately describes that this pull request is mirroring the upstream elastic#133903 for AI review and indicates it is a snapshot of the HEAD tree, making it clear to reviewers what the PR’s purpose is.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval/pr-133903-diff

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (1)

167-179: Optional: also assert data_stream presence in empty-fields test.

Since you pass TargetIndex.defaultMetrics(), consider asserting data_stream.* exists even when resource/scope are empty, to prevent regressions.

@@
-        documentBuilder.buildMetricDocument(builder, dataPointGroup);
+        documentBuilder.buildMetricDocument(builder, dataPointGroup);
@@
-        assertThat(doc.evaluate("unit"), is(nullValue()));
+        assertThat(doc.evaluate("unit"), is(nullValue()));
+        assertThat(doc.evaluate("data_stream.type"), equalTo("metrics"));
+        assertThat(doc.evaluate("data_stream.dataset"), equalTo("generic.otel"));
+        assertThat(doc.evaluate("data_stream.namespace"), equalTo("default"));
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)

63-66: Consider honoring mapping hints instead of always passing empty.

If users provide elasticsearch.mapping.hints, propagate them to dynamic template selection.

-            String dynamicTemplate = dataPoint.getDynamicTemplate(MappingHints.empty());
+            // Parse once per group if needed; here kept local for clarity
+            String dynamicTemplate = dataPoint.getDynamicTemplate(
+                MappingHints.fromAttributes(dataPointGroup.dataPointAttributes())
+            );
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c4e49b and c345e98.

📒 Files selected for processing (10)
  • modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (2 hunks)
  • server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (2 hunks)
  • x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (6 hunks)
  • x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1 hunks)
  • x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java (2 hunks)
  • x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (4 hunks)
  • x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContextTests.java (4 hunks)
  • x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (1 hunks)
  • x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (5 hunks)
  • x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/tsid/AttributeListTsidFunnelTests.java (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
  • TargetIndex (21-158)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)
  • DataStream (87-2183)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
  • TargetIndex (21-158)
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (1)
server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)
  • DataStream (87-2183)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
  • TargetIndex (21-158)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (1)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)
  • TargetIndex (21-158)
🔇 Additional comments (17)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (2)

26-26: Good switch to TargetIndex.defaultMetrics().

Using the shared default ensures tests track routing/index construction semantics consistently.

Also applies to: 77-78, 146-147, 178-179


106-108: Validating data_stream fields is spot on.

These assertions correctly reflect the derived defaults: type=metrics, dataset=generic.otel, namespace=default.

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)

53-55: Data stream block integration looks good.

Emitting data_stream only when targetIndex.isDataStream() is true matches the routing model and keeps documents clean for plain-index targets.

Also applies to: 115-124

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java (1)

30-31: No external usages of MAPPING_HINTS found; making it private is safe.
Tests use the literal "elasticsearch.mapping.hints", so they’re unaffected.

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (1)

12-12: Centralizing sanitization approved — DataStream.sanitizeType, sanitizeDataset, and sanitizeNamespace use the same disallowed-character patterns (type & dataset: "[\\/?"<>| ,#:-]", namespace: "[\\/?"<>| ,#:]"), enforce MAX_LENGTH = 100, and replace with “_”. No behavior drift detected.

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)

108-118: LGTM! Clean public API design.

The three public static sanitization methods (sanitizeType, sanitizeDataset, sanitizeNamespace) provide a clear, well-structured API for data stream field normalization. The delegation to a common private helper ensures consistency.

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContextTests.java (3)

62-87: LGTM! Test correctly validates target index-based grouping.

The test testGroupingDifferentTargetIndex effectively verifies that data points with different data_stream.dataset attributes are grouped separately and routed to distinct target indexes. The use of containsInAnyOrder is appropriate for unordered collection assertions.


244-266: LGTM! Test validates receiver-based routing.

The test testReceiverBasedRouting correctly verifies that the receiver name is extracted from the scope name and used to determine the target index. The expected index metrics-hostmetricsreceiver.otel-default aligns with the receiver name hostmetricsreceiver parsed from the scope path.


268-289: LGTM! Test covers edge case for receiver name extraction.

The test testReceiverBasedRoutingWithoutTrailingSlash validates the edge case where the scope name ends immediately after the receiver name (no trailing path segments). This ensures extractReceiverName correctly handles the absence of a trailing slash.

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (3)

23-37: LGTM! Test correctly validates explicit index precedence.

The test verifies that when elasticsearch.index is set, it takes precedence over data_stream.* attributes, and the result is a non-data-stream target index.


39-53: LGTM! Test validates data stream attribute handling and sanitization.

The test correctly verifies:

  1. Dataset sanitization: custom-datasetcustom_dataset (hyphen replaced), then .otel suffix added.
  2. Namespace sanitization: custom-namespace remains unchanged (hyphens allowed in namespace).

55-142: LGTM! Comprehensive test coverage for TargetIndex evaluation logic.

The test suite thoroughly validates:

  • Scope and resource attribute handling
  • Attribute precedence rules (attributes > scope > resource)
  • Receiver-based routing
  • Default value behavior
  • Sanitization of uppercase, hyphens, and special characters

All assertions align with the expected behavior of TargetIndex.evaluate and DataStream sanitization methods.

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (3)

164-194: LGTM! Receiver name extraction logic is sound.

The extractReceiverName method correctly parses the receiver name from the scope name by:

  1. Finding the /receiver/ segment
  2. Extracting the substring between /receiver/ and the next / (or end of string)
  3. Returning null if no receiver pattern is found

Edge cases are handled correctly, including scope names ending with the receiver name (no trailing slash).


220-226: LGTM! TargetIndex.evaluate call is correct.

The call to TargetIndex.evaluate correctly passes:

  • Type: TargetIndex.TYPE_METRICS ("metrics")
  • Attributes: data point, scope, and resource attributes in precedence order
  • Receiver name: extracted from the scope name

This ensures the target index is computed with all relevant context.


172-172: LGTM: StringTargetIndex consistently applied; tests and buildDataStream updated accordingly. Public API change: callers must call .index() or .toString() to get the string representation.

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (2)

21-40: API shape looks good; defaults and caching are practical

The public API and defaultMetrics() caching read well, and the data-stream vs explicit-index split is clear.


58-64: All evaluate() call sites use non-null list arguments
DataPoint.getAttributes()/getScopeAttributes()/getResourceAttributes() never return null and tests use List.of(). Safe to add requireNonNull checks for the list parameters.

Comment on lines +102 to +106
private static final int MAX_LENGTH = 100;
private static final String REPLACEMENT = "_";
private static final Pattern DISALLOWED_IN_TYPE = Pattern.compile("[\\\\/*?\"<>| ,#:-]");
private static final Pattern DISALLOWED_IN_DATASET = Pattern.compile("[\\\\/*?\"<>| ,#:-]");
private static final Pattern DISALLOWED_IN_NAMESPACE = Pattern.compile("[\\\\/*?\"<>| ,#:]");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

Verify pattern correctness and rationale for identical TYPE and DATASET patterns.

The patterns DISALLOWED_IN_TYPE and DISALLOWED_IN_DATASET are identical, both disallowing [\\/*?\"<>| ,#:-]. However, DISALLOWED_IN_NAMESPACE differs by not disallowing hyphens (only [\\/*?\"<>| ,#:]).

Confirm that:

  1. The regex patterns correctly escape special characters (especially the backslash and forward slash).
  2. TYPE and DATASET should indeed have identical character restrictions, or if there's a functional difference that should be reflected.
  3. The choice to allow hyphens in namespace but not in type/dataset is intentional and documented.

Run the following script to check if these patterns are used consistently across the codebase and verify any related documentation:


🏁 Script executed:

#!/bin/bash
# Search for usages of these sanitization methods and any related comments
rg -n -C3 'sanitizeType|sanitizeDataset|sanitizeNamespace|DISALLOWED_IN_TYPE|DISALLOWED_IN_DATASET|DISALLOWED_IN_NAMESPACE'

Length of output: 12218


Comment on lines +120 to +127
private static String sanitizeDataStreamField(String s, Pattern disallowedInDataset) {
if (s == null) {
return null;
}
s = s.toLowerCase(Locale.ROOT);
s = s.substring(0, Math.min(s.length(), MAX_LENGTH));
return disallowedInDataset.matcher(s).replaceAll(REPLACEMENT);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Consider edge case: empty or all-invalid input strings.

The sanitization logic correctly handles null input and applies lowercase, length truncation, and character replacement. However, consider these edge cases:

  1. Empty string input: Returns empty string after sanitization, which may not be a valid dataset/namespace/type.
  2. All-disallowed-characters input: For example, "###" becomes "___" after sanitization, which might not be semantically meaningful.

Verify that callers handle empty or sanitized-to-invalid results appropriately, or add validation to reject such inputs.


🏁 Script executed:

#!/bin/bash
# Search for callers of these sanitization methods to verify they handle edge cases
rg -n -A5 'sanitizeType\(|sanitizeDataset\(|sanitizeNamespace\(' --type java

Length of output: 3450


Validate empty sanitized values
sanitizeDataStreamField returns "" for empty or fully-disallowed inputs, but callers only default on null (e.g. TargetIndex.java 79–89), so empty results slip through. Treat empty sanitized outputs as invalid (null) or add an isEmpty() check before defaulting.

🤖 Prompt for AI Agents
In server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java
around lines 120–127, sanitizeDataStreamField can return an empty string for
inputs that are empty or fully-disallowed which callers only treat null as
invalid; update the method so that after converting, trimming length and
applying the disallowed pattern, if the result is empty (result.isEmpty())
return null instead of an empty string so callers that check for null will
correctly treat fully-sanitized values as invalid.

Comment on lines +10 to +16
import io.opentelemetry.proto.common.v1.KeyValue;

import org.elasticsearch.cluster.metadata.DataStream;
import org.elasticsearch.core.Nullable;

import java.util.List;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Objects import to support null-checks in evaluate()

 import io.opentelemetry.proto.common.v1.KeyValue;
 
 import org.elasticsearch.cluster.metadata.DataStream;
 import org.elasticsearch.core.Nullable;
 
 import java.util.List;
+import java.util.Objects;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import io.opentelemetry.proto.common.v1.KeyValue;
import org.elasticsearch.cluster.metadata.DataStream;
import org.elasticsearch.core.Nullable;
import java.util.List;
import io.opentelemetry.proto.common.v1.KeyValue;
import org.elasticsearch.cluster.metadata.DataStream;
import org.elasticsearch.core.Nullable;
import java.util.List;
import java.util.Objects;
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 10 to 16, the evaluate() method needs null-check utilities; add the
import statement `import java.util.Objects;` at the top of the file and update
evaluate() to use Objects (e.g., Objects.requireNonNull(...) or
Objects.isNull(...) as appropriate) for null checks instead of raw null
comparisons to make the logic clearer and avoid potential NPEs.

Comment on lines +42 to +46
public static boolean isTargetIndexAttribute(String attributeKey) {
return attributeKey.equals(ELASTICSEARCH_INDEX)
|| attributeKey.equals(DATA_STREAM_DATASET)
|| attributeKey.equals(DATA_STREAM_NAMESPACE);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

NPE risk in isTargetIndexAttribute(String) when attributeKey is null

Use constant-first equals to avoid null deref.

-    public static boolean isTargetIndexAttribute(String attributeKey) {
-        return attributeKey.equals(ELASTICSEARCH_INDEX)
-            || attributeKey.equals(DATA_STREAM_DATASET)
-            || attributeKey.equals(DATA_STREAM_NAMESPACE);
-    }
+    public static boolean isTargetIndexAttribute(String attributeKey) {
+        return ELASTICSEARCH_INDEX.equals(attributeKey)
+            || DATA_STREAM_DATASET.equals(attributeKey)
+            || DATA_STREAM_NAMESPACE.equals(attributeKey);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static boolean isTargetIndexAttribute(String attributeKey) {
return attributeKey.equals(ELASTICSEARCH_INDEX)
|| attributeKey.equals(DATA_STREAM_DATASET)
|| attributeKey.equals(DATA_STREAM_NAMESPACE);
}
public static boolean isTargetIndexAttribute(String attributeKey) {
return ELASTICSEARCH_INDEX.equals(attributeKey)
|| DATA_STREAM_DATASET.equals(attributeKey)
|| DATA_STREAM_NAMESPACE.equals(attributeKey);
}
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 42 to 46, the method isTargetIndexAttribute(String) calls
attributeKey.equals(...) and will NPE if attributeKey is null; change the
comparisons to use constant-first equals (e.g.
ELASTICSEARCH_INDEX.equals(attributeKey) ||
DATA_STREAM_DATASET.equals(attributeKey) ||
DATA_STREAM_NAMESPACE.equals(attributeKey)) or use Objects.equals(constant,
attributeKey) so null attributeKey is handled safely.

Comment on lines +58 to +97
public static TargetIndex evaluate(
String type,
List<KeyValue> attributes,
@Nullable String receiverName,
List<KeyValue> scopeAttributes,
List<KeyValue> resourceAttributes
) {
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on scope.name
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
target.populateFrom(attributes);
target.populateFrom(scopeAttributes);
target.populateFrom(resourceAttributes);
if (target.index == null) {
target.type = type;
if (target.dataset == null && receiverName != null) {
target.dataset = receiverName;
}
target.dataset = DataStream.sanitizeDataset(target.dataset);
if (target.dataset == null) {
target.dataset = DEFAULT_DATASET;
}
// add otel suffix to match OTel index template
target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
target.namespace = DataStream.sanitizeNamespace(target.namespace);

if (target.namespace == null) {
target.namespace = DEFAULT_NAMESPACE;
}
target.index = target.type + "-" + target.dataset + "-" + target.namespace;
} else {
target.type = null;
target.dataset = null;
target.namespace = null;
}
return target;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Harden evaluate(): validate inputs, sanitize type, and avoid double “.otel” suffix

  • Enforce non-null for type and attribute lists.
  • Sanitize type via DataStream.sanitizeType(...) to mirror dataset/namespace handling.
  • Append OTEL suffix only if not already present.
     public static TargetIndex evaluate(
         String type,
         List<KeyValue> attributes,
         @Nullable String receiverName,
         List<KeyValue> scopeAttributes,
         List<KeyValue> resourceAttributes
     ) {
+        Objects.requireNonNull(type, "type must not be null");
+        Objects.requireNonNull(attributes, "attributes must not be null");
+        Objects.requireNonNull(scopeAttributes, "scopeAttributes must not be null");
+        Objects.requireNonNull(resourceAttributes, "resourceAttributes must not be null");
         // Order:
         // 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
         // 2. read data_stream.* from attributes, scope.attributes, resource.attributes
-        // 3. receiver-based routing based on scope.name
+        // 3. receiver-based routing based on receiverName
         // 4. use default hardcoded data_stream.* (<type>-generic-default)
         TargetIndex target = new TargetIndex();
         target.populateFrom(attributes);
         target.populateFrom(scopeAttributes);
         target.populateFrom(resourceAttributes);
         if (target.index == null) {
-            target.type = type;
+            // Sanitize type just like dataset/namespace (keeps index name safe).
+            target.type = DataStream.sanitizeType(type);
             if (target.dataset == null && receiverName != null) {
                 target.dataset = receiverName;
             }
             target.dataset = DataStream.sanitizeDataset(target.dataset);
             if (target.dataset == null) {
                 target.dataset = DEFAULT_DATASET;
             }
-            // add otel suffix to match OTel index template
-            target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+            // add otel suffix to match OTel index template (avoid double-suffix)
+            if (target.dataset.endsWith(OTEL_DATASET_SUFFIX) == false) {
+                target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+            }
             target.namespace = DataStream.sanitizeNamespace(target.namespace);
 
             if (target.namespace == null) {
                 target.namespace = DEFAULT_NAMESPACE;
             }
             target.index = target.type + "-" + target.dataset + "-" + target.namespace;

Based on relevant code snippet indicating sanitizeType exists in DataStream.

Would you confirm sanitizeType(String) is present in this tree? If not, I’ll switch to a minimal lowercase/length guard instead.


🏁 Script executed:

#!/usr/bin/env bash
rg -nP --type java "sanitizeType" -C3

Length of output: 1796


Harden evaluate(): validate inputs, sanitize type, and avoid double “.otel” suffix

  • Add non‐null checks for type, attributes, scopeAttributes, and resourceAttributes.
  • Sanitize type via DataStream.sanitizeType(type).
  • Append OTEL_DATASET_SUFFIX only if it’s not already present.
     public static TargetIndex evaluate(
         String type,
         List<KeyValue> attributes,
         @Nullable String receiverName,
         List<KeyValue> scopeAttributes,
         List<KeyValue> resourceAttributes
     ) {
+        Objects.requireNonNull(type, "type must not be null");
+        Objects.requireNonNull(attributes, "attributes must not be null");
+        Objects.requireNonNull(scopeAttributes, "scopeAttributes must not be null");
+        Objects.requireNonNull(resourceAttributes, "resourceAttributes must not be null");
         // Order:
         // 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
         // 2. read data_stream.* from attributes, scope.attributes, resource.attributes
         // 3. receiver-based routing based on receiverName
         // 4. use default hardcoded data_stream.* (<type>-generic-default)
         TargetIndex target = new TargetIndex();
         target.populateFrom(attributes);
         target.populateFrom(scopeAttributes);
         target.populateFrom(resourceAttributes);
         if (target.index == null) {
-            target.type = type;
+            target.type = DataStream.sanitizeType(type);
             if (target.dataset == null && receiverName != null) {
                 target.dataset = receiverName;
             }
             target.dataset = DataStream.sanitizeDataset(target.dataset);
             if (target.dataset == null) {
                 target.dataset = DEFAULT_DATASET;
             }
-            // add otel suffix to match OTel index template
-            target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+            // add OTEL suffix to match OTel index template (avoid double-suffix)
+            if (target.dataset.endsWith(OTEL_DATASET_SUFFIX) == false) {
+                target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
+            }
             target.namespace = DataStream.sanitizeNamespace(target.namespace);
 
             if (target.namespace == null) {
                 target.namespace = DEFAULT_NAMESPACE;
             }
             target.index = target.type + "-" + target.dataset + "-" + target.namespace;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static TargetIndex evaluate(
String type,
List<KeyValue> attributes,
@Nullable String receiverName,
List<KeyValue> scopeAttributes,
List<KeyValue> resourceAttributes
) {
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on scope.name
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
target.populateFrom(attributes);
target.populateFrom(scopeAttributes);
target.populateFrom(resourceAttributes);
if (target.index == null) {
target.type = type;
if (target.dataset == null && receiverName != null) {
target.dataset = receiverName;
}
target.dataset = DataStream.sanitizeDataset(target.dataset);
if (target.dataset == null) {
target.dataset = DEFAULT_DATASET;
}
// add otel suffix to match OTel index template
target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
target.namespace = DataStream.sanitizeNamespace(target.namespace);
if (target.namespace == null) {
target.namespace = DEFAULT_NAMESPACE;
}
target.index = target.type + "-" + target.dataset + "-" + target.namespace;
} else {
target.type = null;
target.dataset = null;
target.namespace = null;
}
return target;
}
public static TargetIndex evaluate(
String type,
List<KeyValue> attributes,
@Nullable String receiverName,
List<KeyValue> scopeAttributes,
List<KeyValue> resourceAttributes
) {
Objects.requireNonNull(type, "type must not be null");
Objects.requireNonNull(attributes, "attributes must not be null");
Objects.requireNonNull(scopeAttributes, "scopeAttributes must not be null");
Objects.requireNonNull(resourceAttributes, "resourceAttributes must not be null");
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on receiverName
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
target.populateFrom(attributes);
target.populateFrom(scopeAttributes);
target.populateFrom(resourceAttributes);
if (target.index == null) {
target.type = DataStream.sanitizeType(type);
if (target.dataset == null && receiverName != null) {
target.dataset = receiverName;
}
target.dataset = DataStream.sanitizeDataset(target.dataset);
if (target.dataset == null) {
target.dataset = DEFAULT_DATASET;
}
// add OTEL suffix to match OTel index template (avoid double-suffix)
if (!target.dataset.endsWith(OTEL_DATASET_SUFFIX)) {
target.dataset = target.dataset + OTEL_DATASET_SUFFIX;
}
target.namespace = DataStream.sanitizeNamespace(target.namespace);
if (target.namespace == null) {
target.namespace = DEFAULT_NAMESPACE;
}
target.index = target.type + "-" + target.dataset + "-" + target.namespace;
} else {
target.type = null;
target.dataset = null;
target.namespace = null;
}
return target;
}
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 58-97, the evaluate(...) method should validate and sanitize
inputs: add explicit non-null checks for type, attributes, scopeAttributes, and
resourceAttributes (throw an IllegalArgumentException with a clear message if
any are null), sanitize the incoming type by calling
DataStream.sanitizeType(type) and use that sanitized value for target.type, and
when appending OTEL_DATASET_SUFFIX only add it if target.dataset does not
already end with that suffix (use a null-safe check after sanitization). Also
ensure you handle receiverName possibly null as before and avoid any operations
on null lists to prevent NPEs.

Comment on lines +65 to +70
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on scope.name
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment nit: align step description with parameter name

Replace “scope.name” with “receiverName” to match the method contract and upstream usage.

-        // 3. receiver-based routing based on scope.name
+        // 3. receiver-based routing based on receiverName
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on scope.name
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
// Order:
// 1. elasticsearch.index from attributes, scope.attributes, resource.attributes
// 2. read data_stream.* from attributes, scope.attributes, resource.attributes
// 3. receiver-based routing based on receiverName
// 4. use default hardcoded data_stream.* (<type>-generic-default)
TargetIndex target = new TargetIndex();
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 65 to 70, the comment listing routing steps refers to "scope.name"
but the method/parameter is named "receiverName"; update the comment to replace
"scope.name" with "receiverName" so the step description aligns with the method
contract and upstream usage.

Comment on lines +101 to +115
private void populateFrom(List<KeyValue> attributes) {
if (isPopulated()) {
return;
}
for (int i = 0, size = attributes.size(); i < size; i++) {
KeyValue attr = attributes.get(i);
if (attr.getKey().equals(ELASTICSEARCH_INDEX)) {
index = attr.getValue().getStringValue();
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) {
dataset = attr.getValue().getStringValue();
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) {
namespace = attr.getValue().getStringValue();
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make “first wins” consistent and exit early when elasticsearch.index is found

Currently, multiple ELASTICSEARCH_INDEX entries in the same list cause “last wins,” unlike dataset/namespace. Prefer “first wins” and return immediately once index is set.

     private void populateFrom(List<KeyValue> attributes) {
         if (isPopulated()) {
             return;
         }
         for (int i = 0, size = attributes.size(); i < size; i++) {
             KeyValue attr = attributes.get(i);
-            if (attr.getKey().equals(ELASTICSEARCH_INDEX)) {
-                index = attr.getValue().getStringValue();
+            if (index == null && attr.getKey().equals(ELASTICSEARCH_INDEX)) {
+                index = attr.getValue().getStringValue();
+                if (index != null) {
+                    return; // explicit index takes precedence; stop scanning this list
+                }
             } else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) {
                 dataset = attr.getValue().getStringValue();
             } else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) {
                 namespace = attr.getValue().getStringValue();
             }
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private void populateFrom(List<KeyValue> attributes) {
if (isPopulated()) {
return;
}
for (int i = 0, size = attributes.size(); i < size; i++) {
KeyValue attr = attributes.get(i);
if (attr.getKey().equals(ELASTICSEARCH_INDEX)) {
index = attr.getValue().getStringValue();
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) {
dataset = attr.getValue().getStringValue();
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) {
namespace = attr.getValue().getStringValue();
}
}
}
private void populateFrom(List<KeyValue> attributes) {
if (isPopulated()) {
return;
}
for (int i = 0, size = attributes.size(); i < size; i++) {
KeyValue attr = attributes.get(i);
if (index == null && attr.getKey().equals(ELASTICSEARCH_INDEX)) {
index = attr.getValue().getStringValue();
if (index != null) {
return; // explicit index takes precedence; stop scanning this list
}
} else if (dataset == null && attr.getKey().equals(DATA_STREAM_DATASET)) {
dataset = attr.getValue().getStringValue();
} else if (namespace == null && attr.getKey().equals(DATA_STREAM_NAMESPACE)) {
namespace = attr.getValue().getStringValue();
}
}
}
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java
around lines 101 to 115, the current loop allows multiple ELASTICSEARCH_INDEX
attributes to overwrite index (last-wins) while dataset/namespace follow
first-wins; change the index handling to follow first-wins and exit early: only
set index if index == null when encountering ELASTICSEARCH_INDEX, and
immediately break/return once index is set so subsequent attributes cannot
overwrite it, leaving dataset and namespace logic unchanged.

Comment on lines +65 to +67
public static boolean isMappingHintsAttribute(String attributeKey) {
return attributeKey.equals(MAPPING_HINTS);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Null-safety for attribute key check.

Defensively guard against null to avoid accidental NPEs when called from other contexts.

-    public static boolean isMappingHintsAttribute(String attributeKey) {
-        return attributeKey.equals(MAPPING_HINTS);
-    }
+    public static boolean isMappingHintsAttribute(String attributeKey) {
+        return MAPPING_HINTS.equals(attributeKey);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static boolean isMappingHintsAttribute(String attributeKey) {
return attributeKey.equals(MAPPING_HINTS);
}
public static boolean isMappingHintsAttribute(String attributeKey) {
return MAPPING_HINTS.equals(attributeKey);
}
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java
around lines 65-67, the method isMappingHintsAttribute(String attributeKey)
calls attributeKey.equals(MAPPING_HINTS) which can throw NPE if attributeKey is
null; change the check to be null-safe by either using
MAPPING_HINTS.equals(attributeKey) or Objects.equals(attributeKey,
MAPPING_HINTS) (and add an import for java.util.Objects if used), so the method
returns false for null inputs without throwing.

Comment on lines 145 to 147
public static boolean isIgnoredAttribute(String attributeKey) {
return attributeKey.equals(MappingHints.MAPPING_HINTS);
return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a null-guard to avoid accidental NPEs.

Even if OTel keys are non-null, this public method may be reused. Guarding costs nothing.

-    public static boolean isIgnoredAttribute(String attributeKey) {
-        return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
-    }
+    public static boolean isIgnoredAttribute(String attributeKey) {
+        if (attributeKey == null) return false;
+        return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static boolean isIgnoredAttribute(String attributeKey) {
return attributeKey.equals(MappingHints.MAPPING_HINTS);
return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
}
public static boolean isIgnoredAttribute(String attributeKey) {
if (attributeKey == null) return false;
return TargetIndex.isTargetIndexAttribute(attributeKey) || MappingHints.isMappingHintsAttribute(attributeKey);
}
🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java
around lines 145-147, the public isIgnoredAttribute method can throw an NPE if
attributeKey is null; add a null-guard at the top (e.g., if (attributeKey ==
null) return false;) before calling TargetIndex.isTargetIndexAttribute(...) and
MappingHints.isMappingHintsAttribute(...), so the method safely returns false
for null inputs.

Comment on lines +18 to 19
import static org.hamcrest.Matchers.equalTo;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Nice coverage for ignored attributes.

This ensures TSID stays stable by excluding routing-only keys.

Add the mapping hints key as well:

@@
-            List.of(
+            List.of(
                 keyValue("elasticsearch.index", "index"),
                 keyValue("data_stream.dataset", "dataset"),
-                keyValue("data_stream.namespace", "namespace")
+                keyValue("data_stream.namespace", "namespace"),
+                keyValue("elasticsearch.mapping.hints", "ignored")
             ),
             funnelBuilder
         );
         assertThat(funnelBuilder.size(), equalTo(0));

Also applies to: 60-70

🤖 Prompt for AI Agents
In
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/tsid/AttributeListTsidFunnelTests.java
around lines 18-19 (and also update the same logic at lines 60-70), the test's
ignored-attributes list is missing the mapping hints key; update the test to
include the mapping hints key in the set of attributes that are excluded from
TSID computation (add the mapping hints constant/string to the ignored
attributes collection where routing-only keys are specified) so the test
verifies TSID stability with that key ignored as well.

@phananh1010 phananh1010 closed this Oct 1, 2025
@phananh1010 phananh1010 reopened this Oct 3, 2025
@phananh1010 phananh1010 closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant